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Abstract 



We present a robust regression estimator for longitudinal data, which is especially suited 
for functional data that has been observed on sparse or irregular time grids. We show by 
simulation that the proposed estimators possess good outlier-resistance properties com- 
pared with the traditional functional least- squares estimator. As an example of application, 
we study the relationship between levels of oxides of nitrogen and ozone in the city of San 
Francisco. 

Key Words: Functional data analysis; Longitudinal data analysis; Mixed effects mod- 
els; Robust statistics; Spline smoothing. 



1 Introduction 



In a typical longitudinal study, a number of variables are measured on a group of individu- 
als and the goal is to analyze the relationships between the trajectories of the variables. In 
recent years, functional data analysis has provided efficient ways to analyze longitudinal 
data. In many cases the variable trajectories are discretized continuous curves that can be 
reconstructed by smoothing, and functional linear regression methods can be applied to 
study the relationship between the variables (Ramsay and Silverman, 2005). But in other 
situations the data is observed at sparse and irregular time points, which makes smoothing 
difficult or even unfeasible. Therefore, functional regression methods that can be applied 
directly to the raw measurements become very useful. 

Methods for functional data analysis of irregularly sampled curves have been proposed 
by a number of authors, for the one-sample problem as well as for the functional regression 
problem (Chiou et al., 2004; James et al., 2000; Muller et al., 2008; Yao et al., 2005a, 
2005b). Outlier-resistant techniques for the functional one-sample problem have also been 
proposed (Cuevas et al., 2007; Gervini, 2008, 2009; Fraiman and Muniz, 2001; Locantore 
et al., 1999), and two recent papers deal with robust functional regression for pre-smoothed 
curves (Zhu et al. 201 1; Maronna and Yohai, 2012). However, outlier-resistant functional 
regression methods for raw functional data have not yet been proposed in the literature. In 
this paper we address this problem and present a computationally simple approach based 
on random-effect models. Our simulations show that this method attains the desired outlier 
resistance against atypical curves, and that the asymptotic distribution of the test statistic 
is approximately valid for small samples. 

As an example of application, we will analyze the daily trajectories of oxides of nitro- 
gen and ozone levels in the city of Sacramento, California, during the summer of 2005. 
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Figure 1 : Ozone Example. Daily trajectories of ground-level concentrations of (a) oxides 
of nitrogen and (b) ozone in the city of Sacramento in the Summer of 2005. 

The data is shown in Figure [TJ The goal is to predict ozone concentration from oxides of 
nitrogen. Both types of curves follow regular patterns, but some atypical curves can be 
discerned in the sample. We will show in Section |4] that to a large extend it is indeed pos- 
sible to predict ozone levels from oxides-of-nitrogen levels, but that the outlying curves 
distort the classical regression estimators and that the proposed robust method gives more 
reliable results. 

The paper is organized as follows. Section [2] presents a brief overview of functional 
linear regression and introduces the new method. Section[3]reports the results of a compar- 
ative simulation study, and Section |4] presents a detailed analysis of the above mentioned 
ozone dataset. Technical derivations and proofs are left to the Appendix. Matlab® pro- 
grams implementing these procedures are available on the author's webpage. 
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2 Method 



2.1 Background: classical functional linear regression 

The functional approach to longitudinal data analysis assumes that the observations (xi, yi), . . . , 
(x n , y n ) are discrete measurements of underlying continuous curves, so 

Xij Xi^Sij) -\- Siji i 1, . . . , 71, j 1, . . . , JTij, (1) 

Vij = Yi(Uj) + 4,-, i = 1, . . . , n, j = 1, . . . , m'i, (2) 

where {Xj(s)} and {Yi(t)} are the trajectories of interest, {%} and {e'^} are random mea- 
surement errors, and {s»j} and are the time points where the data is observed. The 
Xi(s)s and the Yi(t)s are random functions that we assume independent and identically 
distributed realizations of a pair (X(s), Y(t)). 

Suppose X(s) and Y(t) are square-integrable functions on an interval [a, b]. Define the 
norm ||/|| = {j b a f 2 (s)ds} 1 / 2 and the inner product (f,g) = f(s)g(s)ds. If E(||X|| 2 ) 
and E(||F|| 2 ) are finite, then X(s) and Y(t) admit the decomposition 

v 

X(s) = v x (s) + J2uMs), (3) 

k=l 
Q 

Y(t) = + (4) 

1=1 

known as the Karhunen-Loeve decomposition (Ash and Gardner 1975, ch. 1.4), where 
/jl x ( s ) — E{X(s)}, fx Y (t) — (0 fc (s)} and {tpi(t)} are orthonormal functions 

(i.e. (4> k i4> k ,) = 5kk' and (ip^ipy) = 5u>, where 5 is Kronecker's delta), and {U k } and 
{V;} are random variables with zero mean and finite variance (without loss of generality, 
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one can assume that var(Z7i) > var(C/ 2 ) > • • • > and var(Vi) > var(V^) > • • • > 0.) 
This is the functional equivalent of the principal-component decomposition in multivariate 
analysis, so the 0fc(s)s and ipi(t)s are called "principal components", and the Uf-s and VJs 
are called "component scores". In principle p and q in © and © could be infinite, but 
since E(\\X - /i x \\ 2 ) = Y7 k =i ™(U k ) and E(||F - /x y || 2 ) = J2Ui var (^) are finite ' tne 
sequences {var(f4)} and {var(V/)} usually decrease to zero fast enough that for practical 
purposes p and q can be assumed to be finite. 

Methods for estimating the mean and the principal components of X(s) and Y{t) can 
be found in Ramsay and Silverman (2005), James et al. (2000), and Yao et al. (2005b). 
These methods are not resistant to outliers, though; outlier-resistant estimators of the 
mean and principal components have been proposed by Locantore et al. (1999), Cuevas et 
al. (2007), and Gervini (2008, 2009). We will use the method of Gervini (2009) to estimate 
the mean and the principal components in © and ©. This method is briefly reviewed in 
the Appendix. 

Now suppose that there is a functional linear relationship between X(s) and Y(t): 



where a (t) is the intercept, (3 Q (s,t) the slope, and Z(t) the error term. We assume 
E{Z(t)} = and cov{X(s), Z(t)} = for all s and t. (Note that the Z is not neces- 
sarily white noise; it is just the portion of Y that is not explained by X, and it is usually a 
smooth non-trivial process.) Since © implies that /i y (t) = a (t) + (3 (s,t)/j, x (s)ds, 




(5) 
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we can rewrite © as 



y(t) = M*)+ / P (s t t){X(s)-fi x (s)}dB + Z(t). 



(6) 



■/ a 



Then the only parameter that remains to be estimated is the regression slope /3 . 

Since {4> k } is an orthonormal basis of the X-space and {^} is an orthonormal basis 
of the F-space, without loss of generality the regression slope can be expressed as 

ft(M) = tfu(^i(^ (7) 

k=l 1=1 



In matrix form, (3 (s,t) = (j)(s) T @ if)(t), where 0(s) = (<fii(s), . . . , 4> p (s)) T and if)(t) = 
(i^i(t), . . . , ip q (t)) T - If we a ls° collect the component scores and {VJ} into vectors 
U G R p and V G IR 9 , from ©, ©, © and © we obtain 

i/>(t) T V = f V (t) T ©o 0(s)0(s) r U ds + Z(t) 



where W 6 R' is the random vector with elements Wi = (Z,^). This reduces the 
functional regression model © to a simpler multivariate regression model, 



V>(tf 0^ U + il>{t) T W, 



V = 0^U + W, 



(8) 



and the problem now is to estimate the regression matrix O . 
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2.2 Outlier-resistant functional regression 

As explained above, given the data (x 1; yx), . . . , (x n , y n ) we use the reduced-rank t esti- 
mators of Gervini (2009) to obtain robust estimators of fi x , /i y , {4> k }, {i^i}, {Uik} and 
{Vu}. By © and ®, the least-squares estimator of f3 (s, t) would be 0(s) T ©i/'(i) with 

n n n 

= argmin V ||V 4 - T U 4 || 2 = (V U.Uf)- 1 V U.Vf. (9) 

However, this estimator is not robust. Although the reduced-rank t estimators of \x x , [i Y , 
{4> k } and {^} are robust, the component scores Uj and Vj are individual parameters that 
will be outliers if the corresponding curves Xi(s) and Yi(t) are outliers. Therefore, the 
estimator of ©o has to incorporate a mechanism to downweight outlying U^s and VjS. 

This can be accomplished, for instance, by a modification of the t-type GM-estimators 
of He et al. (2000), that we will call GMt for short. Let 

n 

(0, E) = argmin Vp{ W (U 4 )(V J - T U i ) T E- 1 (V i - T Ui)} + nlog |E| , (10) 

where = + q) log (1 +x/v). These are the maximum likelihood estimators of ©o 
and E when W in © follows a multivariate t distribution with mean zero and scatter 
matrix E /w(Uj), although we do not actually assume that W follows this distribution; 
as in He et al. (2000), this is just the motivation behind definition (flOl) . 
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It is shown in the Appendix that and £ satisfy the fixed-point equations 



© = <£p'teMUi)UiU?f J]p / (e,)«;(U i )U,V l T , (11) 



i=l J i=l 



1 " 



n . 
i=i 



(12) 



where Rj = Vj — © T U« and = u>(Uj)Rf £ _1 Rj. These equations can be solved 
iteratively by a reweighting algorithm. 

As for the weights iu(Uj), they are essentially a by-product of the estimation of 
{0 fc } and {Uj}. Since -E , (U i ) = and var(Uj) = diag(Ai, . . . , A p ), the UjS are approx- 
imately uncorrected with mean zero. The squared Mahalanobis distance of Uj is then 

= Y7k=i Ufk/^k, and large Dfs will correspond to X-outliers. The Dfs will follow an 
approximate Xp distribution if the data is Gaussian. 

This suggests a number of weighting schemes. One possibility is to use "metric" trim- 
ming, 

(13) 

0, otherwise, 

where xl,i- a * s the 1 — a quantile of the distribution. Another possibility is to use 
rank-based trimming, 



, 1. rank(A 2 )/n<l-a, 
«7(Ui) = <( (14) 

0, otherwise. 



The latter will always eliminate the an observations with largest Mahalanobis distances, 
even if they are not actual outliers; so we recommend not using an unnecessarily large a 
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for rank-based trimming. In practice, the choice of a can be based on the proportion of 
outliers observed in a boxplot or histogram of the Dfs. 

The estimator defined above belongs to the general class of M-estimators, which 
have well-known asymptotic properties (Van der Vaart, 1998, ch. 5). As shown in the 
Appendix, y/n{vec(G)— vec(@ )} follows an approximate N(0, A _1 BA _1 ) distribution 
for large n, with 

A = 2E{p"(e)w 2 (U)E 1 RR T <g>UU T } +I,®E{p'(e)«;(U)UU T } , (15) 
B = E[{p'(e)}V(U)RR T ®UU r ]. (16) 

The matrices A and B can be easily estimated, replacing expectations by averages. This 
asymptotic distribution can be used, for instance, to test significance of the regression: 
if ©o = O, Wald's statistic Q = nvec(©) AB Avec(@) follows an approximate xf> q 
distribution for large n, so we decide the regression is significant if Q > Xpg,i- a f° r a gi ven 
level a. We can also construct marginal tests and confidence intervals for the individual 
coefficients 6 ki- 
ln Section [3] we will study the accuracy of this asymptotic approximation. It is our 
experience that the distribution of © approaches normality quite fast, but the above "sand- 
wich formula" tends to underestimate the variance when the sample size n is small. In that 
case it is better to use bootstrap estimators of the covariance matrix of vec(@). 

3 Simulations 

In this section we study by simulation the finite- sample behavior of the estimators (flOl) . To 
this end, we generated data from model ® with U ~ N(0, A) and W ~ N(0, S), where 
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A = diag(l, 1/2, . . . , 1/p) and S = diag(l, 1/2, ... , l/q). Two regression parameters 
©o were considered: for the first set of simulations (to study estimation error) we took 
©o with #0,11 = 3 and 9 0ti j = for (i, j) ^ (1, 1); for the second set of simulations (to 
study the goodness of the asymptotic approximation of Wald's test) we took @ = O. The 
curves {Xi(s)} and {Yi(t)} were generated following © and ©, with /U x (s) and /i y (t) 
equal to zero, <p k (s) = y/2sm(kns) and ^i(t) = \/2sm(lirt), for s and t in [0, 1]. The raw 
observations were generated following (OQ) and ©, with random SijS uniformly distributed 
in [0, 1], {eij} and {e'^} independent N (0,0.01), and rrii = vn! i = m; for simplicity we 
took the grid {Uj} equal to {s^}. 

The first series of simulations were designed to study estimation error of the @s, both 
for clean and for outlier-contaminated data. We generated outliers by replacing [en] of the 
pairs (U<, V,) by (UJ, V?), with = U a + 5 and U* 3 = U i3 for j ^ 1, and V* = W, 
Note that the contaminated data (U*,V*) follows model © with @ = O and high- 
leverage U*s, so the effect of this type of contamination is an underestimation of o ,ii mat 
tends to pull 0(s, t) towards 0. 

The estimation of @ requires two steps: first, to estimate {Uj} and {Vj} from the raw 
data, and then to compute © from the UjS and the VjS. So we compared two procedures: a 
non-robust procedure, using reduced-rank Normal models (James et al., 2000) to estimate 
the component scores, followed by the ordinary least- squares regression estimator ©; and 
a robust procedure, using reduced-rank t-models (Gervini, 2009) to estimate the compo- 
nent scores, followed by the GMt regression estimator (flOl) . For the robust procedure, we 
considered the two types of weights u>(U) discussed in Section I2T21 with trimming pro- 
portions a = .10 and a = .50; degrees of freedom v = 1 and v = 5 were used for the 
t-models. 
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Four levels of contamination e were considered: (clean data), .10, .20 and .30. We 
took n = 50 as sample size, m = 20 as grid size, and p = q = 2 as model dimensions. 
Each case was replicated 1000 times. As measure of the estimation error we used the 
expected root integrated squared error E(||/3 — /3 ||), where — /3 || 2 = J Q J^{P(s, t) — 
/3 (s,t)} 2 ds dt. 

The results are reported in Tabled] along with Monte Carlo standard errors. We see 
that for non-contaminated data (e = 0), there is no significant difference between metric 
and rank trimming for a given pair (is, a). The trimming proportion a has a larger impact 
on the estimator's behavior than the degrees of freedom v. For this reason we recommend 
choosing a adaptively, so as not to cut off too much good data. When e > 0, we see that 
metric trimming tends to outperform rank trimming for a given pair (u, a). Somewhat 
counterintuitively, estimators with v — 5 tend to be more robust than those with v = 1 
for a given a; the reason is that for this type of contamination, which affects @ but not 
the <p k s or the fas, t models with v = 5 provide more accurate estimators of {U^} and 
{Vj} than t models with v — 1 (for other types of contamination this is no longer true, 
although t models with v — 5 are still very robust; see Gervini (2009).) In general, then, 
the recommendation is to use t-model estimators with metrically trimmed weights and a 
trimming proportion chosen adaptively. 

The second series of simulations were designed to assess the finite-sample adequacy 
of the asymptotic Wald test. To this end we generated data as before, but with @ = 
O. Then Q = nvec(@) r2 _1 vec(©) should approximately follow a xl Q distribution, 
where ft is the asymptotic covariance matrix of ^ l /nvec(@). For GMt estimators, Q is the 
"sandwich formula" given in Section [2i2l for the least-squares estimator, Q = E(RR T ) <g> 
{E(UU T )} _1 . Table |2] reports the tail probabilities P(Q > xl q ,i- a ) f° r m e usual values 
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Contamination proportion 
Estimator 0% 10% 20% 30% 



Least squares .293 (.004) 2.241 (.006) 2.644 (.048) 2.731 (.007) 

GMt, v = 1, a = .10 

Metric trim .472 (.006) .497 (.007) 1.316 (.028) 2.924 (.008) 

Rank trim .473 (.006) .469 (.007) 2.246 (.028) 2.941 (.006) 

GMt, v = 1, a = .50 

Metric trim .846 (.012) .800 (.013) 1.112 (.018) 1.756 (.022) 

Rank trim .832 (.012) .922 (.015) 1.212 (.018) 1.784 (.021) 

GMt, v = 5, a = .10 

Metric trim .379 (.005) .396 (.005) 1.493 (.023) 2.746 (.006) 

Rank trim .374 (.005) .395 (.006) 2.341 (.011) 2.792 (.005) 

GMt, v = 5, a = .50 

Metric trim .795 (.010) .666 (.011) .912 (.015) 1.494 (.021) 
Rank trim .783 (.010) .829 (.013) 1.054 (.017) 1.506 (.021) 

Table 1: Simulation Results. Mean root integrated squared errors of f3 under various 

contamination proportions (Monte Carlo standard errors in parenthesis). 
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Nominal probability 
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n 
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= 20, 


LS 


.1452 (.0035) 


.0813 (.0027) 


.0211 (.0014) 


P 


= q = 3 




GMt 


.2750 (.0045) 


.1900 (.0039) 


.0875 (.0028) 


n 


= 150, m 


= 20, 


LS 


.1316 (.0034) 


.0718 (.0026) 


.0144 (.0012) 


P 


= q = 3 




GMt 


.2111 (.0041) 


.1360 (.0034) 


.0514 (.0022) 
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= 200, m 


= 10, 


LS 


.1185 (.0032) 


.0625 (.0024) 


.0169 (.0013) 


P 


= q = 3 




GMt 


.1782 (.0038) 


.1122 (.0032) 


.0391 (.0019) 



Table 2: Simulation Results. Finite-sample tail probabilities of Wald's significance-of- 
regression test for nominal asymptotic probabilities .10, .05 and .01 (Monte Carlo standard 
errors in parenthesis). 

of a (.10, .05 and .01) and various combinations of parameters n, m, p and q. Each 
combination was replicated 10,000 times. We compared only two estimators this time: the 
least-squares estimator and the 10% metrically trimmed GMt estimator with v = 5. We 
see in Table [2] that the asymptotic x^ q approximation works reasonably well for the least- 
squares estimator if the ratio n/pq exceeds 15; however, for the GMt estimator a ratio 
n/pq of at least 35 is necessary for the asymptotic approximation to be reasonably good. 
Therefore, the asymptotic Wald test can be used with confidence only for large sample 
sizes and relatively small dimensions. In other cases, permutation tests or Wald tests with 
bootstrap-estimated covariances are preferable. 
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4 Application: Ozone Pollution Data 

Ground-level ozone is an air pollutant known to cause serious health problems. Unlike 
other pollutants, ozone is not emitted directly into the air but forms as a result of complex 
chemical reactions, including volatile organic compounds and oxides of nitrogen among 
other factors. Modeling ground-level ozone formation has been an active topic of air- 
quality studies for many years. The California Environmental Protection Agency database, 
available at http://www.arb.ca.gov/aqd/aqdcd/aqdcddld.htm, has collected data on hourly 
concentrations of pollutants at different locations in California for the years 1980 to 2009. 
Here we will focus on the trajectories of oxides of nitrogen (NOx) and ozone (03) in 
the city of Sacramento (site 301 1 in the database) between June 6 and August 26 of 2005, 
which make a total of 82 days (shown in FigureQ]). There are a few days with some missing 
observations (9 in total), but since the method can handle unequal time grids, imputation 
of the missing data was not necessary. 

The first step in the analysis is to fit reduced-rank models to the sample curves. We 
used cubic B-splines with 7 equally spaced knots every 5 years, and fitted Normal and t\ 
(Cauchy) reduced-rank models with up to 10 principal components. For both the response 
and the explanatory curves, the leading three components explain at least 85% of the total 
variability, so we retained these models. The means and the principal components are 
plotted in Figure [2l There is no substantial difference between the estimators obtained by 
these models, except perhaps for the mean and the third component of log-NOx (Figures 
|2|(a) and (g)). 

With the Normal component scores we computed the Least Squares estimator, obtain- 
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Figure 2: Ozone Example. Normal ( ) and Cauchy ( ) reduced-rank B-spline es- 
timators of the mean [(a),(b)], the first principal component [(c),(d)], the second principal 
component [(e), (f)] and the third principal component [(g),(h)] of log-NOx and root-03 
trajectories. 
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The latter cut off 5 observations out of the 82. There are some noticeable differences be- 
tween these two estimators, even leaving aside the third row (which are not easily compa- 
rable, since 4> LS 3 (s) and (f) GM3 (s) are rather different). The differences are more striking 
in the slope estimators f3 LS (s, t) and f3 GM (s, t), shown in Figure [3l There is a "bump" in 
(3 GM (s,t) around (s,t) = (8, 16) that does not appear in f3 LS (s,t). This means that the 
robust slope estimator assigns positive weight to NOx values around 8am in the prediction 
of 03 levels around 4pm, showing that there is a persistent effect of oxides-of-nitrogen 
level in ozone formation. 

Of course, none of this would be meaningful if the regression model was not statisti- 
cally significant. But the estimated response curves, shown in Figure H clearly show that 
the model does predict the response curves to a large extent. The robust estimator provides 
a better fit overall, with a root median squared error of .022 compared to the root median 
squared error of .023 for the least squares estimator. 
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Figure 3: Ozone Example. Functional slope estimators obtained by (a) least squares using 
Normal scores and (b) metric-trimmed GMt using Cauchy scores. 



Figure 4: Ozone Example. Daily trajectories of root-03 levels: (a) observed, (b) predicted 
by robust GMt estimator, and (c) predicted by least squares. 
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Appendix 
Reduced-rank t models 

The method proposed by Gervini (2009) to estimate the mean and the principal compo- 
nents of a stochastic process X works as follows. The mean function /i x and the princi- 
pal components {4> k } are modeled as spline functions; that is, given a set of spline basis 
functions b 1 , . . . , b N , chosen by the user, it is assumed that n x ( s ) = J2iLi £zM s ) and 
^ki s ) = J2iLi Vkfii( s )- The observed vector x, can then be expressed as 

x, = + B,HA 1/2 z, + ae t , 

where B, = H = fan • • • > %] and A = dia g(^i; • • • > \)- Note that U i = 

A 1 / 2 z i in this notation. By assuming (z i; £j) has a standard multivariate t distribution, ro- 
bust maximum likelihood estimators of {%}, {A fc } and a are obtained. The estimators 
are computed via a standard EM algorithm. The optimal number of components p can be 
chosen via AIC or BIC criteria. See Gervini (2009) for details. In addition to parame- 
ter estimates, the EM algorithm yields predictors of the random effects z i5 so one obtains 
i as a by-product. The estimators of fi Y , {^fc}* and {V;} are obtained in a 
similar way from the sample y l5 . . . , y n . 
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GMt estimating equations and asymptotics 

The estimators and E defined by (flOl ) are M-type estimators (Van der Vaart, 1998, 
ch. 5), since they minimize a function of the form M(0, E) = ^ XT=i m (©,s)(Uj, Vj). 
Specifically, 

mce.sjCU*, VO = ^(U,)^ - T U J ) T S- 1 (V J - T U 4 )} + log |E| . 

Then and E solve the equations ^M(0, E) = O and ^M(0, E) = O. To com- 
pute matrix derivatives we use the method of differentials (Magnus and Neudecker, 1999). 
Differentiating with respect to we obtain 

dm^s)^,^) = p'(e i )^(U l )2(V,, - T U i ) T S- 1 {-(d0) T U,} 
= -2p'(e i ) W (U,)tr{(d0) T U i (V i - T U l ) T S- 1 } 
= -2p / (e i ) W (U,)vec(d0) T vec{U i (V, - T U l ) T S- 1 }, 

where e, L = u7(Ui)(V, - T U i ) T E- 1 (V i - T Ui). Then 

V vec (e)m(e,s)(U i , V t ) = -2p , (e J ) W (U i )vec{U l (V i - T U i ) T E- 1 }, (17) 

which can be rearranged in matrix form as 

^rm (0|S) (U i) Vi) = -2p'(e J ) U ;(U l )U i (V i - ©^fS" 1 , 
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and (fTTT) follows. Differentiating m with respect to E we obtain 

dm(0 )S )(Ui, Vj) = 

= p / (e i )w(U i )(V, - T U l ) T {-S- 1 (dS)S- 1 }(V J - T U 4 ) + t^E^dE)} 

= - P '(e i )w(ij l )ti{i:- 1 (v i - e T Ui)(Vi - e T u i ) T s- 1 (ds)} + trfs-^dE)} 

= -p / (ei)w(U i )vec{S- 1 (Vi - T U,)(V, - T U J ) T S- 1 } T vec(dS) 
+vec(S- 1 ) T vec(dS), 

so 

V v ec(S)m (0iS )(Ui, Vj) 

= - /) / (e<)«'(Ui)vec{E- 1 (V < - T U,)(V. 4 - T U i ) T S- 1 } + vec^" 1 ). 
Again, this can be expressed in matrix form as 

d ~ « 

= -p'(e i )w(U i )S- 1 (V J - T U i )(V i - T U i ) T S- 1 + E~\ 

from which ([12]) follows. 

We will simplify the derivation of the asymptotic distribution of by assuming that 
the true component scores (Uj, Vj) are used, instead of the estimated scores (Uj,Vj), 
and by assuming that E is fixed and known. In that case we can apply Theorem 5.23 of 
Van der Vaart (1998) directly, and obtain that - v /n{vec(0)— vec(0 o )} is asymptotically 



19 



N(0, A^BA" 1 ) with 

A = E{V V ec(0)V^ c(0) m ( Oj s o )(U, V)} 

and 

B = E{V vec (©)'m(©o,So)(U, V)V^ ec (0)7T7,(0 Oi (U,V)}; 

these expectations are taken with respect to the true parameters (0 O , S ). Without loss of 
generality we can eliminate the factor 2S _1 in (fTTT) ; then it is easy to see that (fT6l) holds. 
To derive (fT5l) we use differentials again: 

d{VL(e)^(©,s )(U,V)} = 

= 2p"(e)w 2 (V)(V - T U) T S o 1 (de) T Uvec{U(V - T U) T } T 

+p'(e)w(U)vec{U(d0 T U) T } T 
= 2p"(e) W 2 (U)tr{(de) T U(V - T U) T £ 1 }vec{U(V - T U) T } T 

+p , (e)w(U)vec(UU T d0) T 
= 2p"(e)w 2 (U)vec(d0) T vec{U(V - t U) t Sq 1 }vec{U(V - T U) T } T 

+p'(e)w(U){(I q ® UU T )vec(d0)} T , 

so 

V ve c(©)V^ ec(@) m ( !So )(U, V) = 
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= 2p"(e)«; 2 (U)vec{U(V - T U) T S o 1 }vec{U(V - T U) T } T 

+ P '(e)w(U){I q ®UXJ T ) 
= 2p"(e)w 2 (U)(So X R g> U)(R g> U) T + p'(e)tw(U)(Ig g> UU T ), 

from which (TT3T ) follows. 
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